Sequence Classification in the Jensen-Shannon Embedding
نویسندگان
چکیده
This paper presents a novel approach to the supervised classification of structured objects such as sequences, trees and graphs, when the input instances are characterized by probability distributions. Distances between distributions are computed via the JensenShannon (JS) divergence, which offers several advantages over the L2 distance or the Kullback-Leibler divergence. The JS divergence induces an embedding of the distributions into a real Hilbert space. A general approach is proposed here to derive a positive definite kernel from any conditionally negative definite (CND) distance, the JS divergence being a particular case of interest. We show how to compute the dot product in the embedding induced by a CND distance, based solely on the distance matrix between training points, and we detail how new points can be added to this embedding. The JS kernel is applied to sequence classification problems. Two kinds of empirical distributions are considered: (i) the N -gram distributions and (ii) the distributions of the First Passage Times (FPT) between occurrences of substrings. Experimental results on DNA splicing junction detection and protein function prediction illustrate that . . . Preliminary work. Under review by the International Conference on Machine Learning (ICML). Do not distribute.
منابع مشابه
A Graph Embedding Method Using the Jensen-Shannon Divergence
Riesen and Bunke recently proposed a novel dissimilarity based approach for embedding graphs into a vector space. One drawback of their approach is the computational cost graph edit operations required to compute the dissimilarity for graphs. In this paper we explore whether the Jensen-Shannon divergence can be used as a means of computing a fast similarity measure between a pair of graphs. We ...
متن کاملGraph Characteristics from the Quantum Jensen-Shannon Graph Kernel
In this paper, we use the quantum Jensen-Shannon divergence as a means to establish the similarity between a pair of graphs and to develop a novel graph kernel. In quantum theory, the quantum Jensen-Shannon divergence is defined as a distance measure between quantum states. In order to compute the quantum Jensen-Shannon divergence between a pair of graphs, we first need to associate a density o...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملTransitive State Alignment for the Quantum Jensen-Shannon Kernel
Kernel methods provide a convenient way to apply a wide range of learning techniques to complex and structured data by shifting the representational problem from one of finding an embedding of the data to that of defining a positive semidefinite kernel. One problem with the most widely used kernels is that they neglect the locational information within the structures, resulting in less discrimi...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کامل